A Speech Recognizer Based on Locally Recurrent Neural Networks
نویسندگان
چکیده
Speech recognition systems (SRS) designed for applications in low cost products like telephones or in systems with energetic constraints like autonomous vehicles are faced with the demand for solutions with low complexity. A small vocabulary consisting of a few command words as well as the digits is suucient for most of the applications but has to be recognized robustly. Here we report about investigations concerning the application of Recurrent Neural Networks for speaker independent speech recognition. Fully Recurrent Neural Networks (FRNN) are used for feature scoring as well as for compensating variations in time durations of speech segments. Two SRS based on FRNN are discussed. Firstly, a phoneme based recognizer is investigated in which the feature scoring as well as the time alignment is performed by FRNN. The performance of the FRNN used for feature scoring is compared to that of a TDNN with optimized delay structure in order to evaluate the capability of FRNN to extract contextual information. The performance of this time alignment FRNN is compared to that of viterbi alignment procedures including diierent types of phoneme duration modeling. Secondly, a SRS consisting of a single FRNN is presented which directly classiies feature vector sequences and thus combines feature scoring and time alignment. To enable an eecient hardware implementation of the SRS we introduce Locally Recurrent Neural Networks (LRNN). LRNN are layered networks which have recurrent connections only between a neuron and its n-nearest neighbours. The neurons of the input and the output layer have unidirectional and sparse connections to the hidden layer. Thus, in comparison to FRNN the density of the connections is drastically reduced. Particularly, long distance wiring could be avoided in a hardware realization. Our experiments have shown that LRNN with recurrent connections to the 5-nearest neighbours of a neuron in the hidden layer achieve the same recognition performance as FRNN.
منابع مشابه
A Speech Recognizer with Low Complexity Based on Rnn
Speech recognition systems (SRS) designed for applications in low cost products like telephones or in systems with energetic constraints like autonomous vehicles are faced with the demand for solutions with low complexity. A small vocabulary consisting of a few command words and the digits is suucient for most of the applications but has to be recognized robustly. Here we report about investiga...
متن کاملStrategies for reducing the complexity of a RNN based speech recognizer
Recurrent Neural Networks (RNN) provide a solution for low cost Speech Recognition Systems (SRS) in mass products or in products with energetic constraints if their inherent parallelism could be exploited in a hardware realization. Actually, the computational complexity of SRS based on Fully Recurrent Neural Networks (FRNN), e.g. the large number of connections, prevents a hardware realization....
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملSpeech Recognition Using Neural Networks
Although speech recognition products are already available in the market at present, their development is mainly based on statistical techniques which work under very specific assumptions. The work presented in this thesis investigates the feasibility of alternative approaches for solving the problem more efficiently. A speech recognizer system comprised of two distinct blocks, a Feature Extrac...
متن کاملContinuous Speech Phoneme Recognition Using Dynamic Artificial Neural Networks
Phoneme classification and recognition is the first step to large vocabulary continuous speech recognition. This step represents the acoustic modeling part of such a system. In hybrid speech recognition systems phoneme recognition is made by artificial neural networks (ANN’s). The main objective of this paper is the investigation of dynamic ANN’s, namely the Time-Delay Neural Networks (TDNN) an...
متن کامل